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INTRODUCTION 



Although analysis of variance is the. most popular statistical tool for 
researchers in the behavioral sciences, it is only recently that the casual 
user has recognized that there is no single correct vay to perform such an 
analysis. For a given factorial design \:ith equal cell frequencies, older 
textbooks uniformly described exactly the same procedure, Earely vas the 
topic of unbalanced data discussed. 

Within the past decade, there has developed an increased awareness that 
analysis of variance can be viewed as a special case of regression analysis. 
This more general approach has made it clear that, for unbalanced designs, 
there is no unique way to perform analysis of variance. For example. Overall 
and Klett (1972) and Kerlinger and Pedhazur (1973) point out several different 
ways to calculate a main effect sum of squares. However, they do not provide 
adeqmte advice on how to choose among the various sums of squares. 

A substantial literature is available to help in the selection of the 
most appropriate method of analysis. Unfortunately, much of this literature 
is very esoteric and is thus ineffective as an aid to the non-statistician. 
It is the purpose of this paper to integrate the literature into a comprehen- 
sive comparison of alternative methods of performing a two-way, fixed effects 
analysis of variance. 



THE MODEL 



The two-way (AxB) analysis of variance is based on the model 




i + a. + p . + y. . 

1 3 13 



+ e. 



ijk 



(1) 



3 



vhere i=l, . . . , a ; b r and k=:l, . . • , n . - . Here, the " . . > 0 

denotes the cell frequencies. 

Using the notation of Searle (1971), we can denote as R(u, a,p, y) 
the reduction in the sum of squares due to the model in (l). Similarly, 
ve could define a. reduced model 

Y. = u + a. + 0. + e. /oN 

and jR(u, a, 3) would then denote the reduction in the sum of squares 
due to this new model. The difference between these reductions is denoted 

R(y|u, a, p) = R(u, a, y) - R(u, a, p) (3) 
and expresses the reduction due to fitting y over and above a, a, and 
This would commonly be considered as the sum of sqiiares due to interaction. 

In an analogous fashion, we could calculate R(a | u, 0, 7 ), R(aj 
or R(a|iO. Any, or all of these could, under the correct conditions;, '-^-e 
interpreted as a sum of squares for the "A" mai effect. How is one to 
choose? What are the "correct conditions" under which all of these may be 
interpreted as a SS(A)? 

The answers to these questions lie deeply imbedded in the interpreta- 
tions one makes of the parameters in the full model (l). It is a common 
misconception that u, a., P. and 7.. must be interpreted as a grand mean, 
A effect, B effect and interaction, respectively. Under certain restric- 
tions ' :e model, these interpretations are correct. However, stach res- 
trictions need not be imposed to derive the analysis. 

THE UNRESTRICTED UGDEL 
The analysis of variance based on the model (l) with no restrictions 
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imposed ir. fully developed by Searle (1971) • Since the norr il equationrj 
for the unrestricted model have no unique least squarec riolution, Searle 
makes use of the concept of a generalized inverse. 

Using matrix notation, the model can be re-expressed as 

where y is an TJxl vector of the y. ; ^ contains the parameters 

< a. , (3 ., y. . > • e is an Nxl vector ci the and X ic a design 

. \ ' ij f ' - ijk 

matrix con:jir;tin^, entirely of zero:: and oner. Normal equal ion.'; can now be 
expressed as 

(X'X) « = X' y , (5) 

A 

where 0 represents some least squares solution. Since X'X is singular, 
we resort to a generalized inverse, (J, which satisfies 

(X'X) G (X'X) = X'X , (6) 

so that 

Q - c; X* y . (7) 
fJearle makes elegant use of the properties of generaliz.ed inverses to 
show that the reduction due to the model is invariant to the choice of n» 
However, choosing a specific c; is equivalent to placing constraints on the 

A 

solution Q . It is critical at this point to recognize that the choice of 
constraints on the solution is a matter of -ronvenience and in no way forces 
similar restrictions on the parameters of the model • 

Using generalizec' inverses to solve normal equations for the full and 
a variety of reduced models, it is simple to derive such expressions as 
R^(y ) 'I, a, f^), R^(aj/<, B, y ), R^(a | .i, 3) and R^(a PO- The subscript 

on R ( ) will indicate a reduction based on an unstricted models Compu- 

o 

£ 



tational formulas and hypotheses tested by these sums of squares are given 
in Table !• 



Insert Table 1 about here 



Observe one strange occurrence in Table 1: 

R^(a|u, 3, y ) = 0 

This startling outcome results from the fact that the design matric.. for 
the full model and for the reduced model 

Y. .J. = u + 3 + r + e (8) 
span exactly the same vector space, in other vords, since the full model has 
far more parameters than can be used, the elimination of the parameters 
does not reduce the effectiveness of the model. In fact, the elimination of 
^y a. and 0. would not change the reduction in the sum of squares due to the 
model. Thus, the reduction due to any effect, over and above the ^ 
uninteresting in the unrestricted model. 

Again referring to Table 1, note that the hypotheses being tested are 
complex and, most likely, not very interpretable. Furtheraore, the hypoth- 
eses corresponding to R^(ct/a) and R^(o;ju,3) depend upon the possibly arbi- 
trary configuration of cell frequencies. These findings provide little direc- 
tion in the choice of a sum of squares for the A-effect. Is it possible that 
some form of restrictions imposed on the model might simplify these hypotheses? 
Will imposition of these restriction change the reduction sums of squares? V/hat 
type of restrictions should be chosen? 

THE RESTRICTED MODEL 
To ansv/er these questions, the model under restrictions be investi- 

gated. Restrictions can be imposed in a very general vray: for an arbi- 



trary set of numbers |v^; i=l,...,a| and j^^jS J=l^»»»^^| vhere 

Iv. = = 3., v;e can impose restrictions of the form 
i 1 J a 

Denoting a? ) the -^eduction in the sum of squares due to a model 

restricted unaer (9), it is possible to shov, using results froai Scheffe 
(1959) that ;R^(y)u,a,p) = yja,a,s) and R^(a|a,3) - R^(a) u,t3). 
Although Scheffe does not consider the situation, it is easil:; shov/n that 
R^{(xlu) = I^^(oil '0* However, Scheffe does show that, in general. 



R^(aj u^p,>') does not equal zero as does ^^i^i ^hP'f y)l furthermore, 
R^(a ) u,P, y) depends upon the choice of the weights | "Wj ^ • For this 



J 

reason, two commonly used sets of restrictions will be explored. The first 
set of restrictions, defined by the weights given in set S^, are 



This implies that 



a. = 3- =y. = y. = 0 (u) 



where the dot notation signifies summing over the subscript replaced by the 
dot. A second restriction is defined by the weights 

{vi^n.yw, V. =nyn}, (12) 
which in turn imposes the restrictions 

In. a. = Ln .p. = In. . = En . X. . = 0 • (13) 

It should be clear that the manner in which the parameters of the model 
relate to the cell means, u^y depends upon •^'he choice of a restriction. 
VJhen no restrictions are imposed, no unique functional relationship exists 



-6- 

between the model *s parameters and the cell means. It is informative to 
express at least one set of parameters^ say the a^, in terms of the cell 
means to see the implications of the two restrictions. Under S^, we find 



that 



(1) 1 . 1 

1 b i ij ab i j ij 1. 

Under S^, the a. are defined 

1 N j ij N ^ ^ 1. .J ij . 

Situations obviously exist where the a^*^^ are all zero, but the af^^ are 

1 ' 1 

not. For this reason, caution must be exercised when testing a hypothesis 
of the form 

H : a. = 0 for all i . 
o 1 

Scheffe proves that, for any set of veights, 

R^(a I u,3,y) = z w. - (i w.)"^ (z w^A.)^, (lU) 



W. = [z w%. 1"^ 
(lU) and denoting the reductic^a as R^( ), simple algebra yields 



where W. = I Z wt/n. . I ar-l A. = £ w. ^. . . Substituting S in 



This sum of squares is identical to that calculated by the weighted squares 
of means method (see Searle (1971, PP- 369-372) or Winer (1971, PP- U17-U18)) 
Applying frequency weighted restrictions as given in S^, v;e obtain 
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(ll;) that 



R (a \u,a,y) =Y^ j -j 




(16) 



i 



This sum of squares is not in common use. The formulas for R (y)n,a,fi) 



Several things should be noted. First, the definitions of the para- 
meters are not the same under as under S^. Thus, even when a hypoth- 
esis under appears to be identical to that under S^, the two may not be 
equivalent (i.e., imply each other), Scheffe (1959) shows that the inter- 
action hypotheses are equivalent under the two sets. Assuming zero inter- 
actions, the hypothesis for R^(a|/0 implies that for K^(a\u). Similarly, 
the two hypotheses for R^(a|'£,3) are conditionally equivalent. However, 
the hypotheses for R^(aju,3y) are not equivalent and this is reinforced by 



pendent upon cell frequencies and the hypothesis for R^(a|a,3) simplifies 
only when expressed conditionally. In summary, imposing restrictions does 
simplify the hypotheses in Table 1, but the appropriate choice of a main effect 
SS remains unclear. 



Rj,(o^ I Of R^(a|u,p) and R^(a \ u,^,y) and the hypotheses they test are given, 
for both S^ and S^, in Table 2. Parameters are superscripted to reinforce 



their differences. 



Insert Table 2 about here 
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CELL IvEAN MODEL 

If denotes a cell mean, the model in (l) can be expressed 

E(Y. ) = 'i + a. + p. + y. . = li. . (17) 

In the absence of restrictions, a., 6. and . have no unique de- 

pendence upon the u... Imposing restrictions of the form (9) allows one 
to express these parameters as functions of the cell means. However, some 
statisticians prefer to eliminate the problem of imposing restrictions by 
writing the model entirely in terms of cell means. This form of the model 
actually predates the overparameterized version. 

The cell mean model is discussed briefly by Searle (l97l) £^nd in great 
depth by Timm and Carlson (1973). In a short and well written article by 
Kutner (197^)^ the hypotheses and corresponding SS of primary interest are 
developed. Table 3 presents some of Kutner *s null hypotheses, along with 
their equivalent reductions. 



Insert Table 3 about here 



Hypothesis H^ is equivalent to the hypothesis tested by R^(y|u,a,p). 
Hypothesis H^ is simply a re-writing of that tested by R^Cajji). The 
reduction R^(oi/'i,0) can be viewed as testing' the conditional hypothesis 

H : a. are all equal ) . . are all equal, 
and this is equivalent to H^ above. The hypothesis tested by R^(a).'i,p,y) 
is easily shora to be equivalent to H^. Searle (1971^ p.315^ eq. 122) also 
derives a statistic, equal in value to R^(a/'i,P, y), but not readily expressed 
a; reduction. In the unrestricted model, the statistic tests the hypothesis 



H : a. + y. are all equal. 

PROPORTIONAL AlH) BALANCED DATA 

It should be clear that R(7ju,a,9) is our only rational choice for the 
interaction SS. Of the possible choices of the SS(A), R(a|u) appears to be 
clearly undesirable. The other two reductions, R(a|u,0) and R(aju,|3,r) 
each have drawbacks; R(a|u,3) tests a very complicated hypothesis in the 
overparameterized, un-restricted model. In the restricted or in the cell 
mean model, it tests a conditional hypothesis. The reduction R(a)y,P, 7) 
tests nothing in the unrestricted model and tests different hypotheses and 
assumes different values in the restricted models, depending upon the 
restriction. How do these different reductions operate for data which are 
proportional? 

Cell frequencies are said to be proportional when 

"ij = (18) 
In this situation, we find that 

^^icc\n,?,y) / Ri(a|'i,p,y) . (19) 
These reductions and their hypotheses are given in Table U. 



Insert Table k about here 

Notice that R(a) 0 and R(a|(i,3) are equal, regardless of vhich restrictions 
are imposed or even when no restrictions are imposed. The reduction 
'R^{cc\:i,^,y) equals this common value, but R^(aju,B,7) does not. 
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The proportionality oj' the cell frequftticies does not substantially 

simplify the hyp-jthesec tested by R^(a|a) and R^(a|a,f>). The same is true 

for R^(a|0 and R. (a|i,p). However, under restrictions S^, the reductions 

R^(a| i), RgCcvj /,?) and T.^{a\u,5,y) all test the simple hypothesis that the 

a are all zero. The reduction R/alM,? , y) also tests that the a are 
1 1 

all zero but the have a different meaning, and thus the hypothesis must 

be viewed as dictinct from that for H^{a\'j,P>,y) . 

The common value of the reductions given in (19) will equal the SS(A) 
calculated from the special formulas for proportional data. Such formulas 
are given in standard experimental design books such as Kirk (1963, p. 201). 

If all cell frequencies are equal (n. . = n), we have a balanced design. 
In that; event, we find that 

R^(a)u) = R^(aV) = Rg^"^*^^ " 

R^(a|.i,3) = R^(a|'t,3) = R2(a|ti,3) - 
R^(a)u,p,7) = n^ioc\:i,&,y). 

The reduction R (aL,3,7) remains zero, but all others are equal and iden- 
o 

tical to the SS(A) given in any elementary statistics book. • The restrictions 

degenerate to the same set and all R^( ) and R^i ) test the same hypoth- 

ecns a = 0. Th'^ R (alu) and R (a|u,3) test the hypothesis that the 
' » i o o 

a. + 7..- are equal, where y. = | Z . . This information is sum- 
marized in Table 5. 



Insert Table 5 about here 



THE REDUCTION CHOSEN 
Elementary statistics texts such as Hays (1973), McNemar (1962), Glass 



-11- 

and Stanley (1970), Edward.? (19C8) and Guilford and Fruchter (1973) consider 
only the balanced case vith restrictions and thus have no choice to make. 
Experimental design texts such as Winer(l962) and Kirk (1968) consider un- 
blanced and proportional designs and recommend, in our notation, R(yjfi,a,0) 
and R(a|a,3). In hxs second edition, Winer (I97l) also suggests the -weighted 
squares of means solution, ?,^{a} i,e>,y) . Texts stressing a linear regression 
approach (e.g., Kelinger and Pedhazur (1973) and Overall and Klett (1972)) 
suggest the choice be made from among R^(a|u), R^(a|u,3) and R^(q-| y). 

Canned <iomputer programs approach the unbalanced design in a vide 
variety of vays . Francis (1973) has surveyed a variety of such packages and 
reports that some, such as CAROLINA (Psychometric Laboratory, University of 
North Carolina) and OSIRIS (University of Michigan, 1970) calculate R^(a)a) 
and then R^(p|'i,a), or vice-versa, depending on the order of input of the 
factors. A program frcm North Carolina State (Barr and Goodnight, 1971) 
calculate^ R (a|u) and R^(p|u); the interaction is calculated by sub- 
traction, yielding a negative sum of squares for the example cited. 

Two programs from the "Biomedical Computer Programs" (Dixon, 1970) pro- 
vide the user vith among the most accurate and general and least expensive 
means of performing the analysis. H'IDX6U (BMDIOV) will automatically cal- 
culate R^(a)a,3,y), R^(3|u,a,y) and R^(y|u, a,p). if the user wishes, 
any other reduction can be obtained by means of extra hypotheses cards. 
BIffi05V is somewhat less automatic and somewhat less general. It requires 
that the user input a design matrix and the specific reductioirs required. In 
light of the wide diversity of possible ways to analyze unbalanced data, it 
is this author's belief that the less automated approach of BMD05V is to 

13 
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be preferred, in that it forces the user to be aware of v?hat procedure ic 
being used. 

A word of caution is in order at this point. A variety of computer 
programs are in common use which claim to handle unbalanced data. Many of these 
provide little (or no) documentation as to the method of analysis. Some, such 
as AVAP23(Veldnian, 1967) in fact perform an unweighted means analysis of 
variance. This is not an exact least squares solution except in the balanced 
case. It is always desirable, prior to relying on any computer program , to 
submit a test problem to it. Overall and Klett (1972) provide one such set 
of test data (see the footnote in Kutner (197^) for a transcribing error) 
along with a large variety of different reductions. 

RECOMIffirmATIONS 

Only one definitive recommendation will be offered: use R(y|u,a,0) 
as the interaction SS. This is no ma^or breakthrough, since most 
use this presently. In attempting to choose between redi ,tions for the "A" 
main effect, this author generally prefers R(alu,p). This preference is not 
based upon desirable properties of R(a|)Li,p), but rather on undesirable 
properties of its competitors. The reduction R(a|u) tests hypotheses which 
are dependent upon cell frequencies for all but the trivial case of equal 
frequencies and the case of proportional frequencies with weighted restric- 
tions. The reduction K(a\u,&,y) is widely used, but has the disadvantage 
of being dependent upon the form of the restrictions placed on the model. One 
could argue that, if we always choose unweighted restrictions of the form of 
(11), this problem would not exist. However, we then are forced to use an 
"unusual" (by widely used textbooks) statistic for the main effects when the 
cell sizes are proportional. The preferred reduction, R(a|jLi,3), always 
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tects hypoiheseG vhich are free of cell frequencies (at least vhen there are 
no eiipty cells). Its value is not affected by restrictions and its hypothesis 
is al'v.'ays reasonable, even with no restrictions r It is true that it "^ests a 
hypothesis vzhich is conditional upon "additivity, " This, hovever, is no real 
disadvanta(];e, since interest in a "main effect" i-i typically present only in 
the absence of interaction. Given even the slightest hint of unequal V... 
most rese..:*chers vill go immediately to simple effect 

In summary, R(a|u,3) is the preferred choice of this author. That 
choice -was made on the basis of subjective evaluation. It is incmbent 
upon each individual to make his or her own decisions. 
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TABLE 2 

Beductions and Hypotheses for the Restricted I'.odelc 



Reduction 



Formula Hypothesis* 

(1) 



R,(y|a,a,p) same as R (>|a,a,3) all 7.^ ^ ^, 

(2) 

R (y|u,a,S) Same as R^(y|u,a,3) all 7., =0 



R (a|a,&,y) f V^_Ji. 

^ n.-r-L —r — ^~~Z^ 



i\(ai..,p) 



(?y.. 'iqf an af^) . 0 

,T IJ ' 1 



i iJ 



■ ^ 

1 2 -1 ? n W 1 



Zn . n. . -i .J iJ 

Z(Zn n ) 
i j .J iJ 



same as R (aja,3) all a[^^ ^- 0 |all r[^^= 0 

sme as E (a|u,3) all af ' = Ojall y^^'= 0 



K,(al.) -=a. H^(alu) ail(af) . J^On 



TT rrr n 



i ^i. ' 1 N 10 ij iJ 
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TABLE k 

Reductions and Hypotheses for Propori;ional Cell Frequencies 



Reduction 



Formula 



Hypothesis 



Rn(Y|y,ot,6) 



ERIC 



R^(Y|y,a,6) 



Ro(«|y.e.Y) 
R^(a|u,6,Y) 
R2(«|y.e.Y) 

RQ(a|y,6) 
Rl(a|y,6) 
RgC^l^.e) 



same as R (Y|y,ot,6) 
o 

same as R (y\\i,Oi,B) 
o 



does not simplify 

„ - 2 _ 2 

in y - N y 

same as H^ia\\i,Q,y) 
same as H^{o.\vi,B,y) 
same as H^(a\]i,B,y) 



all Yfl)= 0 
all y(2)= 0 



none 
all a!^' 
alla'^' 

1 



= 0 
= 0 



all a = a [all Y . , = Y 

all a."^ = 0 lall Y^^^ = Y 

. (2) , (2) _■ 
all a =0 lall y = y 



Rj^Calu) 
RgCctly) 



same as (a|y,6,Y) 



same as R (a|y,6,Y) 



same as H^(a.\]i,B ,y) 



all (a +ZJJ_ (6 + Y.)) 
equal 

all (ap) + Z^(3(l) + yi]h) 



/■ xequal 
all iif' =0 
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TABLE 5 

Reductions and Hypotheses for Equal Cell Frequencies 



Reduction 



Formula 



Hypothesis 



VY|y,ct,B) 



R^(Y|y,a,6) 
R2(Y|y,a,6) 



n?Z y . .2 - bnS y. ^ 
ij ij • 1. . 

-an? y 2 + Ny" ^ 
j • J • • • • 

same as R^(Y}y,a,3) 
same as R^(Y|y,ot,3) 



all Yij= 0 
all Yij= 0 



R3^(a|y,6,Y) 

Rl(a|y,6) 
R2(a|y,3) 

Rl(a|y) 



•bnE y . 2 _ 2 
same as R-j^(a|y,3,Y) 

same as R-j^(a|y,3,Y) 
same as R-j^(a|y,3,Y) 
same as R-j^Cot |y ,3 ,y) 

same as R^(a|y,3,Y) 
same as R^(a|y,3,Y) 
same as R-j^(a|y ,3,y) 



none 



all = 0 



all = 0 



all a . =a I all y,- 4 = Y 
J- • ij • 

all = o|all y^j = 0 
all = 0|all y^j 



= 0 



all a. + Y-f equal 

1 JL m 



all ttj^ = 0 



all ^ 0 
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